Data Science Workbench - Building Your Model

Retailers with analytics teams and data scientists can define new algorithms and test them against the 200+ road-tested strategies that are included in the standard Recommend license. 

After you've created a table to be the model behind a strategy, you can publish the strategy and use it in your site placements just like any other strategy.

Your site must be configured to use the Data Science Workbench and your FTP credentials must be created. Contact your Algonomy team if either of these is not the case.

There are size limits per strategy. A query/strategy should not exceed 1 GB. Anything that does exceed this limit will be flagged, monitored, and may be revised due to performance considerations. 

Basic Workflow

  1. Build a table that translates your recommendation idea into an algorithm.
    • Who: technical team (SQL experience highly recommended) 
    • Where: Personalization Dashboard > Optimization > Data Science Workbench > Data Source > Query Editor

  1. Create a strategy and publish it. 
    • Who: technical team/administrators
    • Where: Personalization Dashboard > Optimization > Data Science Workbench > Strategies > New Strategy

For more information on how to create your strategy, see Creating Strategies in the Data Science Workbench.

  1. Use the strategy in your recommendations.
    • Who: Administrators
    • Where: Personalization Dashboard > Recommendations > Strategy Configuration > Production

Building a Table in the Query Editor

The first step in building your own strategy is to assemble the data you need into a table. 

The query editor gives you access to your site's clickstream data, as well as any other data you've brought in through Build. You can also use Hive to build your table.

Tips

  • Create your table in the work database.
  • Keep in mind that strategies always use key values associated with product IDs and scores. The Strategy Structure gives you the details. 
  • If you are planning on using the scheduling functionality, make sure your SQL query and table exists in Hive.
  • Catalog data is loaded into the Data Science Workbench once per day, based on a snapshot of the catalog at the time of the update.
  • If your strategy is failing to build, verify there are no comments in the associated query, as this may be causing the query to hang, which causes the model to not build and the strategy to fail. 

Strategy Structure

Custom strategies always return lists of product IDs. They can use any one of these key values:  

  • product category ID 
  • product ID
  • product brand ID
  • customer segment ID
  • user ID

The key value is used as the seed for the strategy: what piece of information from the page should the strategy use to choose which products to recommend? 

Sitewide Strategy Tables

Sitewide strategies require a table that contains a column labeled 'sitewide" and for that column to be selected as the key for the strategy. If this is not done, the sitewide strategy will not work.

Composable Strategy Tables

Composable strategies use more than one key value as the seed. For this to function properly, the table behind the strategy must have all keys within one column separated by a semicolon. The columns must also be in the following order: Key, Product ID, score.

For example, for a strategy with a user and category key, the table should look something like this:

Key Product ID Score
user_id;category_id product_id Score
1234;p-13387 p4595013 0.34

Definition and File Format

Custom strategies are defined by a simple table with three columns:

Field Format Description
Key  Alphanumeric The input ("seed") for the strategy. The seed value is provided through instrumentation for each call that potentially uses the strategy. 
customer segment ID   A unique identifier for a customer segment, as instrumented on your site.
product ID   A unique identifier for the item (sometimes called the item ID). See the Relevance Cloud Developers Site for a detailed description.
product brand ID   A unique identifier for a brand, often instrumented in order to enable brand filtering.
product category ID   The category ID associated with a product or set of products, used to provide category context
user ID    A user ID, generally connected to an email address, is a direct link to the login credentials a consumer uses to log into and buy products from a e-commerce site. See the Relevance Cloud Developers Site for a detailed description.
Product  Alphanumeric  Product ID (from your product catalog data feed). 
Score Numeric Scores should be sorted in descending order.

Data Updates

Catalog Updates: Catalog data is loaded into DSW once a day, starting at roughly 2:30 AM PT each day. It takes the data per site from a hadoop extract of the catalog that is generated 3 times a day. It is not directly connected to feed processing.

DSW Data Updates: The timing varies for scheduled updates based on the query execution that runs. After the query execution runs and executes, the extract and model build is relatively quick. Generally, updates should happen not much longer after the scheduled update time.